Search CORE

95 research outputs found

People-search : searching for people sharing similar interests from the web

Author: Li Quanzhi
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2007
Field of study

On the Web, there are limited ways of finding people sharing similar interests or background with a given person. The current methods, such as using regular search engines, are either ineffective or time consuming. In this work, a new approach for searching people sharing similar interests from the Web, called People-Search, is presented. Given a person, to find similar people from the Web, there are two major research issues: person representation and matching persons. In this study, a person representation method which uses a person\u27s website to represent this person\u27s interest and background is proposed. The design of matching process takes person representation into consideration to allow the same representation to be used when composing the query, which is also a personal website. Based on this person representation method, the main proposed algorithm integrates textual content and hyperlink information of all the pages belonging to a personal website to represent a person and match persons. Other algorithms, based on different combinations of content, inlink, and outlink information of an entire personal website or only the main page, are also explored and compared to the main proposed algorithm. Two kinds of evaluations were conducted. In the automatic evaluation, precision, recall, F and Kruskal-Goodman F measures were used to compare these algorithms. In the human evaluation, the effectiveness of the main proposed algorithm and two other important ones were evaluated by human subjects. Results from both evaluations show that the People-Search algorithm integrating content and link information of all pages belonging to a personal website outperformed all other algorithms in finding similar people from the Web

Digital Commons @ New Jersey Institute of Technology (NJIT)

Data Sets: Word Embeddings Learned from Tweets and General Data

Author: Li Quanzhi
Liu Xiaomo
Nourbakhsh Armineh
Shah Sameena
Publication venue
Publication date: 03/05/2017
Field of study

A word embedding is a low-dimensional, dense and real- valued vector representation of a word. Word embeddings have been used in many NLP tasks. They are usually gener- ated from a large text corpus. The embedding of a word cap- tures both its syntactic and semantic aspects. Tweets are short, noisy and have unique lexical and semantic features that are different from other types of text. Therefore, it is necessary to have word embeddings learned specifically from tweets. In this paper, we present ten word embedding data sets. In addition to the data sets learned from just tweet data, we also built embedding sets from the general data and the combination of tweets with the general data. The general data consist of news articles, Wikipedia data and other web data. These ten embedding models were learned from about 400 million tweets and 7 billion words from the general text. In this paper, we also present two experiments demonstrating how to use the data sets in some NLP tasks, such as tweet sentiment analysis and tweet topic classification tasks

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Information Mining: Integrating Data Mining and Text Mining for Business Intelligence

Author: Brook Yi-fang
Li Quanzhi
Publication venue: AIS Electronic Library (AISeL)
Publication date: 31/12/2006
Field of study

AIS Electronic Library (AISeL)

Event Detection from Social Media Stream: Methods, Datasets and Opportunities

Author: Chao Yang
Li Dong
Li Quanzhi
Lu Yao
Zhang Chi
Publication venue
Publication date: 28/06/2023
Field of study

Social media streams contain large and diverse amount of information, ranging from daily-life stories to the latest global and local events and news. Twitter, especially, allows a fast spread of events happening real time, and enables individuals and organizations to stay informed of the events happening now. Event detection from social media data poses different challenges from traditional text and is a research area that has attracted much attention in recent years. In this paper, we survey a wide range of event detection methods for Twitter data stream, helping readers understand the recent development in this area. We present the datasets available to the public. Furthermore, a few research opportunitiesComment: 8 page

arXiv.org e-Print Archive

Rumor Detection on Social Media: Datasets, Methods and Opportunities

Author: Li Quanzhi
Liu Yingchi
Si Luo
Zhang Qiong
Publication venue
Publication date: 01/01/2019
Field of study

Social media platforms have been used for information and news gathering, and they are very valuable in many applications. However, they also lead to the spreading of rumors and fake news. Many efforts have been taken to detect and debunk rumors on social media by analyzing their content and social context using machine learning techniques. This paper gives an overview of the recent studies in the rumor detection field. It provides a comprehensive list of datasets used for rumor detection, and reviews the important studies based on what types of information they exploit and the approaches they take. And more importantly, we also present several new directions for future research.Comment: 10 page

arXiv.org e-Print Archive

Crossref

Generating Better Concept Hierarchies Using Automatic Document Classification

Author: Li
Quanzhi Chen
Stefan Razvan
Xin Wu
Yi-Fang Brook Bot
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT This paper presents a hybrid concept hierarchy development technique for web returned documents retrieved by a meta-search engine. The aim of the technique is to separate the initial retrieved documents into topical oriented categories, prior to the actual concept hierarchy generation. The topical categories correspond to different semantic aspects of the query. This is done using a 1-of-n automatic document classification, on the initial set of returned documents. Then, an individual topical concept hierarchy is automatically generated inside each of the resulted categories. Both steps are executed on the fly at retrieval time. Due to the efficiency constraints imposed by the web retrieval context, the algorithm only uses document snippets (rather than full web pages) for both document classification and concept hierarchy generation. Experimental results show that the algorithm is able to improve the quality of the concept hierarchy presented to the searcher; at the same time, the efficiency parameters are kept within reasonable intervals

CiteSeerX

International Asteroid Warning Network Timing Campaign : 2019 XS

Author: Balam David D.
Barkov Anatoly P.
Bauer James M.
Bertesteanu Daniel
Birlan Mirel
Bolin Bryce T.
Brucker Melissa J.
Buzzi Luca
Chambers Kenneth C.
Demetz Lukas
Djupvik Anlaug A.
Elenin Leonid
Elizabeth Elizabeth M.
Farnham Tony
Farnocchia Davide
Fini Paolo
Flynn Randy
Galli Gianni
Gao Xing
Gedek Marcin
Granvik Mikael
Hasubick Werner
Ivanov Alexander L.
Ivanov Viktor A.
Ivanova Natalya V.
Jaques Cristovao
Kasikov Anni
Kelley Michael S.
Kim Myung-Jin
Lane David
Lee Hee-Jae
Li Bin
Li Fan
Lister Tim
Lysenko Vadim E.
Magnier Eugene A.
Mahomed Nawaz
McCormick Jennie
Micheli Marco
Moon Darrel
Nastasi Alessandro
Nedelcu Dan A.
Neue Guenther
Payne Matthew J.
Petrescu Elisabeta
Popescu Marcel
Prosperi Enrico
Reddy Vishnu
Reszelewski Rafal
Roh Dong-Goo
Romanov Filipp D.
Santana-Ros Toni
Schmalz Anastasia
Schmalz Sergei
Scotti James V.
Seaman Robert
Sioulas Nick
Sonka Adrian B.
Tholen David J.
Trelia Madalina M.
Wainscoat Richard
Wang Xin
Wells Guy
Weryk Robert
Yakovenko Nikolai A.
Ye Quanzhi
Yim Hong-Suh
Zhai Chengxing
Zhang Chen
Zhao Haibin
Zhu Tinglei
Zolnowski Michal
Publication venue
Publication date: 01/07/2022
Field of study

Peer reviewe

Repositorio Institucional de la Universidad de Alicante

Helsingin yliopiston digitaalinen arkisto